The Perils of Ignoring Data Suitability - The Suitability of Data used to Train Neural Networks Deserves More Attention

نویسنده

  • Kevin Swingler
چکیده

The quality and quantity (we call it suitability from now on) of data that are used for a machine learning task are as important as the capability of the machine learning algorithm itself. Yet these two aspects of machine learning are not given equal weight by the data mining, machine learning and neural computing communities. Data suitability is largely ignored compared to the effort expended on learning algorithm development. This position paper argues that some of the new algorithms and many of the tweaks to existing algorithms would be unnecessary if the data going into them were properly pre-processed, and calls for a shift in effort towards data suitability assessment and correction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیش بینی تراز آب زیرزمینی دشت شاهرود استفاده از شبکه عصبی مصنوعی تابع پایه شعاعی

     Groundwater level prediction is an important issue in scheduling and managing water resources. A number of approaches such as stochastic, fuzzy networks and artificial neural network have been used for such prediction. A neural network model has been employed in this research for Shahrood plain groundwater level prediction. For this reason, statistical parameters of groundwater level fluct...

متن کامل

Estimating the habitat suitability of the genus Alosa in the Caspian Sea using the PATREC method and presence data

In many habitat evaluation methods, the abundance data are used. Such data are not available for many species. However, there is some website that provides the presence data of species that are based on the studies made. The present study used the PATREC method to estimate the habitat suitability of the Caspian Sea for the genus Alosa. The PATREC method needs abundance data to calculate the pri...

متن کامل

Prediction of Pervious Concrete Permeability and Compressive Strength Using Artificial Neural Networks

Pervious concrete is a concrete mixture prepared from cement, aggregates, water, little or no fines, and in some cases admixtures. The hydrological property of pervious concrete is the primary reason for its reappearance in construction. Much research has been conducted on plain concrete, but little attention has been paid to porous concrete, particularly to the analytical prediction modeling o...

متن کامل

Green Space Suitability Analysis Using Evolutionary Algorithm and Weighted Linear Combination (WLC) Method

With current new urban developments, no balance can be found between green spaces and open areas present within urban networks and natural land patterns since urban networks are dominating ecological networks. Accordingly, one of the major tasks of urban and regional planners is the optimal land use allocation to urban green spaces. Therefore, to achieve this goal in this research, locations of...

متن کامل

ارزیابی قابلیت استفاده سیستم اطلاعات آماری فرابر در دانشگاه‌های علوم پزشکی کشور

Introduction: Statistical information systems are used to gather, document, classify and analyze the data and statistics of an organization and distribute this statistics for upper managers. Medical sciences universities in Iran use such a system, called Farabar. This study conducted to evaluate the usability of this national statistical information system in medical sciences universiti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011